We are using a two-component hurdle model: first, the model predicts whether a disease will be present (binary), and if present, it predicts the case count (integer). Here we compare the results of a boosted tree model to our baseline model.

Disease Status

disease status confusion matrix
.metric desc model full_model
accuracy proportion of the data that are predicted correctly baseline 0.85
xgboost 0.96
kap similar measure to accuracy(), but is normalized by the accuracy that would be expected by chance alone and is very useful when one or more classes have large frequency distributions. baseline 0.44
xgboost 0.89
sens the proportion of positive results out of the number of samples which were actually positive. baseline 0.99
xgboost 0.98
spec the proportion of negative results out of the number of samples which were actually negative baseline 0.36
xgboost 0.90
disease status confusion matrix by taxa
.metric model birds buffaloes camelidae cats cattle cervidae dogs equidae hares/rabbits sheep/goats swine
accuracy baseline 0.85 0.76 0.770 0.77 0.86 0.730 0.80 0.91 0.86 0.86 0.87
xgboost 0.95 0.96 0.960 0.97 0.96 0.960 0.95 0.97 0.96 0.96 0.96
kap baseline 0.41 0.20 0.130 0.39 0.56 0.059 0.52 0.42 0.20 0.47 0.42
xgboost 0.84 0.91 0.890 0.93 0.89 0.910 0.90 0.87 0.87 0.89 0.89
sens baseline 0.98 1.00 1.000 1.00 0.99 1.000 0.99 0.99 0.99 0.99 0.99
xgboost 0.98 0.97 0.970 0.97 0.97 0.970 0.96 0.99 0.98 0.98 0.98
spec baseline 0.34 0.14 0.091 0.33 0.49 0.043 0.48 0.31 0.14 0.37 0.32
xgboost 0.85 0.94 0.920 0.96 0.91 0.950 0.94 0.87 0.88 0.91 0.90
disease status confusion matrix by continent
.metric model Africa Americas Asia Europe NA Oceania
accuracy baseline 0.84 0.82 0.85 0.87 0.94 0.930
xgboost 0.95 0.96 0.96 0.95 NA 0.990
kap baseline 0.48 0.38 0.47 0.46 0.45 0.120
xgboost 0.88 0.91 0.89 0.85 NA 0.930
sens baseline 0.99 0.99 0.99 0.99 1.00 1.000
xgboost 0.97 0.98 0.98 0.98 NA 1.000
spec baseline 0.40 0.30 0.38 0.37 0.33 0.067
xgboost 0.90 0.93 0.91 0.86 NA 0.910
disease status direction change confusion matrix
.metric desc model full_model
accuracy proportion of the data that are predicted correctly baseline 0.850
xgboost 0.960
kap similar measure to accuracy(), but is normalized by the accuracy that would be expected by chance alone and is very useful when one or more classes have large frequency distributions. baseline 0.052
xgboost 0.540
sens the proportion of positive results out of the number of samples which were actually positive. baseline 0.470
xgboost 0.570
spec the proportion of negative results out of the number of samples which were actually negative baseline 0.680
xgboost 0.810

Note there are some baseline “outbreak ends” predictions. This occurs in cases where the lag1 disease status is 1, but the lag1 cases are 0 or NA. The predict() function predicts lag1 cases only when the lag1 disease status is 1.

disease status direction change confusion matrix by taxa
.metric model birds buffaloes camelidae cats cattle cervidae dogs equidae hares/rabbits sheep/goats swine
accuracy baseline 0.850 0.760 0.770 0.770 0.860 0.7300 0.800 0.910 0.860 0.860 0.870
xgboost 0.950 0.960 0.960 0.970 0.960 0.9600 0.950 0.970 0.960 0.960 0.960
kap baseline 0.061 0.033 0.036 0.038 0.038 0.0076 0.045 0.091 0.043 0.054 0.058
xgboost 0.390 0.660 0.640 0.720 0.520 0.7100 0.640 0.510 0.530 0.550 0.520
sens baseline 0.440 0.580 0.560 0.570 0.430 0.5600 0.510 0.480 0.470 0.480 0.480
xgboost 0.520 0.600 0.600 0.620 0.560 0.6300 0.600 0.560 0.580 0.560 0.550
spec baseline 0.680 0.660 0.670 0.670 0.670 0.6400 0.670 0.700 0.680 0.680 0.680
xgboost 0.760 0.860 0.850 0.880 0.800 0.8800 0.850 0.790 0.810 0.810 0.800
disease status direction change confusion matrix by continent
.metric model Africa Americas Asia Europe NA Oceania
accuracy baseline 0.840 0.820 0.850 0.87 0.94 0.930
xgboost 0.950 0.960 0.960 0.95 NA 0.990
kap baseline 0.034 0.028 0.057 0.09 0.12 0.038
xgboost 0.550 0.570 0.550 0.48 NA 0.530
sens baseline 0.450 0.470 0.470 0.47 0.50 0.610
xgboost 0.560 0.570 0.580 0.56 NA 0.510
spec baseline 0.670 0.670 0.680 0.69 0.69 0.700
xgboost 0.810 0.820 0.810 0.79 NA 0.800
disease status variable importance and partial dependency (xgboost only)
disease status partial dependency of disease_status_lag2 by select disease (xgboost only)
disease status partial dependency of disease_status_lag1 by select disease (xgboost only)
disease status partial dependency of ever_in_country_given_taxa by select disease (xgboost only)
disease status partial dependency of ever_in_country_any_taxa by select disease (xgboost only)
disease status partial dependency of cases_lag1_missing by select disease (xgboost only)
disease status partial dependency of cases_lag2 by select disease (xgboost only)
disease status partial dependency of log_human_population by select disease (xgboost only)
disease status partial dependency of disease_status_lag3 by select disease (xgboost only)
disease status partial dependency of cases_lag3 by select disease (xgboost only)
disease status partial dependency of cases_lag_sum_border_countries by select disease (xgboost only)
disease status partial dependency of log_gdp_per_capita by select disease (xgboost only)
disease status partial dependency of cases_lag3_missing by select disease (xgboost only)
disease status partial dependency of disease_status_lag2 by select direction change (xgboost only)
disease status partial dependency of disease_status_lag1 by select direction change (xgboost only)
disease status partial dependency of ever_in_country_given_taxa by select direction change (xgboost only)
disease status partial dependency of ever_in_country_any_taxa by select direction change (xgboost only)
disease status partial dependency of cases_lag1_missing by select direction change (xgboost only)
disease status partial dependency of cases_lag2 by select direction change (xgboost only)
disease status partial dependency of log_human_population by select direction change (xgboost only)
disease status partial dependency of disease_status_lag3 by select direction change (xgboost only)
disease status partial dependency of cases_lag3 by select direction change (xgboost only)
disease status partial dependency of cases_lag_sum_border_countries by select direction change (xgboost only)
disease status partial dependency of log_gdp_per_capita by select direction change (xgboost only)
disease status partial dependency of cases_lag3_missing by select direction change (xgboost only)